Kurator: A Kepler Package for Data Curation Workflows

نویسندگان

  • L. Dou
  • G. Cao
  • Paul J. Morris
  • Robert A. Morris
  • B. LudSscher
  • James A. Macklin
  • James Hanken
چکیده

Data curation is critical for scientific data digitization, sharing, integration, and use. This paper presents Kurator, a software package for automating data curation pipelines in the Kepler scientific workflow system. Several curation tools and services are integrated into this package as actors to enable construction of workflows to perform and document various data curation tasks. The integration of Google cloud services (e.g., Google spreadsheets), allows workflow steps to invoke human experts outside the workflow in a manner that greatly simplifies the complex data handling in distributed, multi-user curation workflows. The Kepler platform provides the modeling, execution and management ability, including a collection-oriented model of computation (COMAD), and provenance tracking and browsing for the curation package. These features not only allow workflows to be easily modeled, maintained, and evolved, but also QA/QC of curation results is facilitated through examination of provenance information recorded during workflow execution. Effectiveness of the Kurator package is demonstrated through a workflow for data curation of natural science collections.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Automated Design, Analysis and Optimization of Declarative Curation Workflows

Data curation is increasingly important. Our previous work on a Kepler curation package has demonstrated advantages that come from automating data curation pipelines by using workflow systems. However, manually designed curation workflows can be error-prone and inefficient due to a lack of user understanding of the workflow system, misuse of actors, or human error. Correcting problematic workfl...

متن کامل

Global Intelligent Content: Active Curation of Language Resources using Linked Data

As language resources start to become available in linked data formats, it becomes relevant to consider how linked data interoperability can play a role in active language processing workflows as well as for more static language resource publishing. This paper proposes that linked data may have a valuable role to play in tracking the use and generation of language resources in such workflows in...

متن کامل

Study of the foundation, models and issues of research data curation and management in scientific and academic environments

Background and Aim: The purpose of this paper is to study, identifying and discuss the foundation and concepts, models and frameworks, dimensions and challenges of research data curation and management in scientific and academic environments. Method: This article is a review article and library method was used to collect scientific and research texts in this field. In this research, external an...

متن کامل

Preservation Aspects of a Curation-Oriented Thematic Aggregator

The emergence of the European Digital Library (Europeana) foregrounds the need for aggregating content using smarter and more efficient ways taking into account its context and production circumstances. This paper presents the main functionalities of MoRe, a curation oriented aggregator that addresses digital preservation issues. MoRe combines aggregation, digital curation and preservation capa...

متن کامل

Prototype of Kepler Processing Workflows For Microscopy And Neuroinformatics

We report on progress of employing the Kepler workflow engine to prototype "end-to-end" application integration workflows that concern data coming from microscopes deployed at the National Center for Microscopy Imaging Research (NCMIR). This system is built upon the mature code base of the Cell Centered Database (CCDB) and integrated rule-oriented data system (IRODS) for distributed storage. It...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012